Kinetic model of DNA replication in eukaryotic organisms 
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We formulate a kinetic model of DNA replication that quantitatively describes recent results 
on DNA replication in the in vitro system of Xenopus laevis prior to the mid-blastula transition. 
The model describes well a large amount of different data within a simple theoretical framework. 
This allows one, for the first time, to determine the parameters governing the DNA replication 
program in a eukaryote on a genome-wide basis. In particular, we have determined the frequency 
of origin activation in time and space during the cell cycle. Although we focus on a specific stage 
of development, this model can easily be adapted to describe replication in many other organisms, 
including budding yeast. 
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Introduction 

Although the organization of the genome for DNA 
replication varies considerably from species to species, 
the duplication of most eukaryotic genomes shares a num- 
ber of common features: 

1) DNA is organized into a sequential series of replication 
units, or replicons, each of which contains a single origin 
of replication, pi 

2) Each origin is activated not more than once during the 
cell-division cycle. 

3) DNA synthesis propagates at replication forks bidirec- 
tionally from each origin. |3j 

4) DNA synthesis stops when two newly replicated re- 
gions of DNA meet. 

Understanding how these parameters are coordinated 
during the replication of the genome is essential for elu- 
cidating the mechanism by which S-phase is regulated in 
eukaryotic cells. In this article, we formulate a stochastic 
model based on these observations that yields a mathe- 
matical description of the process of DNA replication and 
provides a convenient way to use the full statistics gath- 
ered in any particular replication experiment. It allows 
one to deduce accurate values for the parameters that 
regulate DNA replication in the Xenopus laevis replica- 
tion system, and it can be generalized to describe replica- 
tion in any other eukaryotic system. This type of model 
has also been shown to apply for the case of RecA poly- 
merizing on a single molecule of DNA. Q| The model, as 
described in the methods section below, turns out to be 
formally equivalent to a well-known stochastic descrip- 
tion of the kinetics of crystal growth, which allows us to 



draw on a number of previously derived results and, per- 
haps equally important, suggests a vocabulary that we 
find useful and intuitive for understanding the process of 
replication. 

Since the kinetics of DNA replication in any cell sys- 
tem depends on two fundamental quantities, replication 
fork velocity and initiation frequency, one of the prin- 
cipal goals of this kind of analysis is to derive accurate 
values for these quantities, including any variation, dur- 
ing the course of S-phase. As replicon size and the du- 
ration of S'-phase depend on the values of these param- 
eters, this information is indispensable for understand- 
ing the mechanisms reg ulatin g S'-phase in any given cell 

system, iiaiinainiiia 



Results 

Summary of the X. Laevis replication experiment 

Here, we describe recent experimental results ob- 
tained on the kinetics of DNA replication in the well- 
characterized Xenopus laevis cell- free system. [l3l,ll4j One 
of the main goals of this paper will be to show that using 
the theoretical approach described below, one can extract 
more information - and more reliably - than before from 
such experiments. 

In the Xenopus replication experiments, fragments 
of DNA that have completed one cycle of replication 
are stretched out on a glass surface using molecular 
combing. [l5l[T^.[r^ | Typical two-color epifluorescence im- 
ages of the combed DNA are shown in Fig. ^ The DNA 
that has replicated prior to some chosen time t is labeled 
with a single fluorescent dye, while DNA that replicated 
after that time is labeled with two dyes. The result is a 
series of samples, each of which corresponds to a different 
time t during S-phase. Using an optical microscope, one 
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can directly measure eye, hole, and eye-to-eye lengths at 
that time. We can thus monitor the evolution of genome 
duplication from time point to time point, as DNA syn- 
thesis advances. (See Fig. [21) 
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FIG. 1: A fluorescence micrograph (bar = 20 /im). Early 
replicating sequences labeled with biotin-dUTP are visualized 
using red fluorescing antibodies (Texas Red). Later replicat- 
ing sequences are in addition labeled with dig-dUTP and vi- 
sualized using green (FITC) fluorescing antibodies. 

Cell-free extracts of eggs from Xenopus laevis support 
the major transitions of the eukaryotic cell cycle, includ- 
ing complete chromosome replication under normal cell- 
cycle control and offers the opportunity to study the way 
that DNA replication is coordinated within the cell cy- 
cle. In the experiment, cell extract was added at t = 2', 
and S-phase began 15 to 20' later. DNA replication was 
monitored by incorporating two different fluorescent dyes 
into the newly synthesized DNA. The first dye was added 
before the cell enters S-phase in order to label the entire 
genome. The second dye was added at successive time 
points t = 25, 29, 32, 35, 39, and 45', in order to label the 
later replicating DNA. DNA taken from each time point 
was combed, and measurements were made on replicated 
and unreplicated re gion s. The experimental details are 
described elsewhere [l3j. but the approach is similar to 
DNA fiber autoradiography, a method that has been in 
use for the last 30 years. [IjJ llSj Indeed, the same ap- 
proach has recently been adapted to study the reg ulatory 
parameters of DNA replication in HeLa cells. 20J Molec- 
ular combing, however, has the advantage that a large 
amount of DNA may be extended and aligned on a glass 
slide which ensures significantly better statistics (over 
several thousand measurements corresponding to several 
hundred genomes per coverslip). Indeed, the molecular 
combing experiments provide, for the first time, easy ac- 



FIG. 2: Schematic representation of labeled and combed DNA 
molecules. Since replication initiates at multiple dispersed 
sites throughout the genome, the DNA can be differentially 
labeled, so that each linearized molecule contains alternating 
subregions stained with either one or both dyes. The bub- 
bles correspond to sequences synthesized in the presence of 
a single dye (red). The green segments correspond to those 
sequences that were synthesized after the second dye (green) 
was added. The result is an unambiguous distinction between 
eyes and holes (earlier and later replicating sequences) along 
the linearized molecules. Replication is assumed to have be- 
gun at the midpoints of the bubble sequences and to have 
proceded bidirectionally from the site where DNA synthesis 
was initiated. Measurements between the centers of adjacent 
eyes provide information about replicon sizes (eye-to-eye dis- 
tances). The fraction of the molecule already replicated by a 
given time, /(r), is determined by summing the lengths of the 
bubbles and dividing that by the total length of the respective 
molecule. 

cess to the quantities of data necessary for testing models 
such as the one advanced in this paper. 



Generalization of the model to account for specific 
features of the X. laevis experiment 

The experimental results obtained on the kinetics of 
DNA replication in the in vitro cell-free system of Xeno- 
pus laevis jTH , Il4| were analyzed using the kinetic model 
developed below. In formulating that model, we found 
that we had to take into account explicitly a number of 
observations that are peculiar to the particular experi- 
ment analyzed: 

1) One goal of the experiment is to measure the initi- 
ation function I(t), which is the probability of initiat- 
ing an origin at time r, per unit length of unreplicated 
DNA. The simplest assumptions, in terms of our model, 
would be that either / is peaked at or near r = (all 
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origins initiated at the beginning of S-phase) or I(t) = 
constant, (origins initiated at constant rate throughout 
S-phase). However, neither assumption turns out to be 
consistent with the data analyzed here; thus, we for- 
mulated our model to allow for arbitrary initiation pat- 
terns and deduced an estimate for J(r) directly from the 
data. We note that initiation is believed to occur syn- 
chronously during the first half of S'-phase in Drosophila 
melanogaster early embryos. |l0ll2l| Initiation in the myx- 
omycete Physarum polycephalum, on the other hand, oc- 
curs in a very broad temporal window, suggesting that 
initiation occurs continuously throughout S'-phase. |5| Fi- 
nally, recent observations suggest that in Xenopus laevis, 
early embryos nucleation may occur with increasing fre- 
quency as DNA synthesis advances. 0, E-l By choosing 
an appropriate form for /(r), one can account for any 
of these scenarios. Below, we show how measured quan- 
tities may, using the model, be inverted to provide an 
estimate for I(t). 

2) The basic form of the model assumes implicitly that 
the DNA analyzed began replication at r = 0, but this 
may not be so, for two reasons: 

i) In the experimental protocols, the DNA analyzed 
comes from approximately 20,000 independently replicat- 
ing nuclei. Before each genome can replicate, its nuclear 
membrane must form, along with, presumably, the repli- 
cation factories. This process takes 15-20 minutes. [22L 
153, 0] Because the exact amount of time can vary from 
cell to cell, the DNA analyzed at time t in the laboratory 
may have started replicating over a relatively wide range 
of times. 

ii) In eukaryotic organisms, origin activation may be 
distributed in a programmed manner throughout the 
length of S-phase, and, as a consequence, each origin 
is turned on at a specific time (early and late).|2^| 

In the current experiment, the lack of information about 
the locations of the measured DNA segments along the 
genome means that we cannot distinguish between asyn- 
chrony due to reasons (i) or (ii) . We can however account 
for their combined effects by introducing a starting-time 
distribution (j)(t'), which is the probability — for whatever 
reason — that a given piece of analyzed DNA began repli- 
cating at time t' in the lab. Using our model, we can 
directly extract the starting time distribution from the 
data. 

3) The models described above assumed that statistics 
could be calculated on infinitely long segments of DNA. 
In the experimental approach, the combed DNA is bro- 
ken down into relatively short segments (100 kb, typi- 
cally). Although it is difficult to account for this effect 
analytically, we wrote a Monte-Carlo simulation that can 
mimic such "finite-size" effects. 

4) The experiments are all analyzed using an epifluores- 



cence microscope to visualize the fluorescent tracks of 
combed DNA on glass slides. The spatial resolution (w 
0.3 fim) means that smaller signals will not be detectable. 
Thus, two replicated segments separated by an unrepli- 
cated region of size < 0.3 /im will be falsely assumed to 
be one longer replicated segment. We accounted for this 
in the Monte-Carlo simulations by calculating statistics 
on a coarse lattice whose size equalled the optical reso- 
lution, while the simulation itself takes place on a finer 
lattice. 



Application of the kinetic model to the analysis of 
DNA replication in X. Laevis 

Using the generalizations discussed above, we analyzed 
recent results obtained on DNA replication in the Xeno- 
pus laevis cell-free system. DNA taken from each time 
point was combed, and measurements were made on 
replicated and unreplicated regions. Statistics from each 
time point were then compiled into six histograms (one 
for each time point) of the distribution /?(/, t) of repli- 
cated fractions / at time t (Fig. [3J|. 
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FIG. 3: p(f, i) distributions for the 6 time points. The curves 
show the probability that a molecule at a given time point 
(A-F) has undergone a certain amount of replication before 
the second dye was added. The filled circles represent the 
experimental data. The results of the Monte-Carlo simulation 
are shown in open circles; analytical curves are the global 
fitting. 

One can immediately see from Fig. |3 the need to ac- 
count for the spread in starting times. If all the segments 
of DNA that were analyzed had started replicating at the 
same time, then the distributions would have been con- 
centrated over a very small range of /. But, as one can 
see in Fig. some segments of DNA (within the same 
time point) have already finished replicating (/ = 1) be- 
fore others have even started (/ = 0). This spread is far 
larger than would be expected on account of the finite 
length of the segments analyzed. Because of the need 
to account for the spread in starting times, it is simpler 
to begin by sorting data by the replicated fraction / of 
the measured segment. We thus assume that all segments 
with a similar fraction / are at roughly the same point in 
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5-phase, an assumption that we can check by partition- 
ing the data into subsets and redoing our measurements 
on the subsets. In Fig. we plot the mean values 

£ h , £i, and £ ai against /. 
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FIG. 4: Mean quantities vs. replication fraction. (A) average 
hole size £h(f); (B) average eye size £i(f); (C) average eye-to- 
eye size £i2i(f)- Filled circles are data; open circles are from 
the Monte-Carlo simulation; the solid curve is a least-squares 
fit, based on a two-segment I(t); (D) curves in (A)-(C) 
collapsed onto a single plot, confirming mean-field hypothesis. 
(The discrepancies near / = and I reflect measurement 
errors. Very small eyes or holes may be missed because of 
limited optical resolution; very large eyes or holes may be 
eliminated because of finite segment sizes.) 

We then find /(t), I(t), and the cumulative distri- 
bution of lengths between activated origins of replica- 
tion, Itot(T~)- (See Fig. [5J) The direct inversion for I(t) 
(Fig. |SJ3) shows several surprising features: First, origin 
activation takes place throughout S'-phase and with in- 
creasing probability (measured relative to the amount of 
unreplicated DNA), as recently inferred from a cruder 
analysis of data from the same system using plasmid 
DNA. Second, about halfway through .S'-phase, there 
is a marked increase in initiation rate, an observation 
that, if confirmed, would have biological significance. 
It is not known what might cause a sudden increase 
(break point) in initiation frequency halfway through S'- 
phase. The increase could reflect a change in chromatin 
structure that may occur after a given fraction of the 
genome has undergone replication. This in turn may in- 
crease the number of potential origins as DNA synthesis 
advances. |2fj| 

The smooth curves in FigQJ\-C are fits based on the 
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FIG. 5: (A) Fraction of replication completed, f(r). Red 
points are derived from the measurements of mean hole, eye, 
and eye-to-eye lengths. Black curve is an analytic fit (see be- 
low). (B) Initiation rate I(t). The large statistical scatter 
arises because the data points are obtained by taking two nu- 
merical derivatives of the f(r) points in A. (C) Integrated 
origin separation, Itot(i~), which gives the average distance 
between all origins activated up to time r. In A-C, the black 
curves are from fits that assume that 7(r) has two linear 
regimes of different slopes. The form we chose for I(t) was 
the simplest analytic form consistent with the data in B. The 
parameters for the least-squares fits (slopes Ii and I2, break 
point ti) are obtained from a global fit to the eight data sets 
in Fig. I2JV-F and Fig. 01^-B, i.e., p(f) from six time points, 
t h (f), and4(/)- 



model, using an I(t) that has two linearly increasing re- 
gions, with arbitrary slopes and "break point" (three free 
parameters). The fits are quite good, except where the fi- 
nite size of the combed DNA fragments becomes relevant. 
For example, when mean hole, eye, and eye-to-eye lengths 
exceed about 10% of the mean fragment size, larger seg- 
ments in the distribution for £h(f), etc., are excluded and 
the averages are biased down. We confirmed this with the 
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Monte-Carlo simulations, the results of which are over- 
laid on the experimental data. The finite fragment size 
in the simulation matches that of the experiment, lead- 
ing to the same downward bias. In Fig. \5\ we overlay 
the fits on the experimental data. We emphasize that 
we obtain J(r) directly from the data, with no fit pa- 
rameters, apart from an overall scaling of the time axis. 
The analytical form is just a model that summarizes the 
main features of the origin-initiation rate we determine 
via our model, from the experimental data. The impor- 
tant result is I(t). From the maximum of /f t( T )j we 
find a mean spacing between activated origins of 6.3 ± 
0.3 kb, which is much smaller than the minimum mean 
eye-to-eye separation 14.4 ± 1.5 kb. 

In our model, the two quantities differ if initiation takes 
place throughout S-phase, as coalescence of replicated re- 
gions leads to fewer domains, and hence fewer inferred 
origins (see the note below Eq. [5] on p. 16). The mean 
eye-to-eye separation is of particular interest because its 
inverse is just the domain density (number of active do- 
mains per length), which can be used to estimate the 
number of active replication forks at each moment during 
S'-phase. For example, the saturation value of Itot corre- 
sponds to the maximum number (about 480,000/genome) 
of active origins of replication. Since there are about 400 
replication foci/cell nucleus, this would indicate a parti- 
tioning of approximately 1,200 origins (or, equivalently, 
about 7.5 Mb) per replication focus. |22lI27I | The distribu- 
tion of / values in the /?(/, t) plots can be used to deduce 
the starting-time distribution (</>(t')), along with the fork 
velocity v. (Fig. |SJ). The spread in starting times 4> is 
consistent with a Gaussian distribution, with a mean of 
15.9±0.6 min. and a standard deviation of 6.1±0.6 min. 
For the fork velocity, we find v = 615 ± 35 bases /min., in 
excellent agreement with previous estimates. [2^, Hj| As 
with the / data, we extracted 4>(t) and v from a global 
fit to data from all six time points. 



Discussion 

Initiation throughout S-phase 

The view that we are led to here, of random initia- 
tion events occurring continuously during the replication 
of Xenopus sperm chromatin in egg extracts, is in strik- 
ing contrast to what has until recently been the accepted 
view of a regular periodic organization of replication ori- 
gins throughout the genome. H, U |3(3, For a discus- 
sion of ex per iments that raise doubts on such a view, see 
Berezney. [32| The application of our model to the results 
of Herrick et al. indicates that the kinetics of DNA repli- 
cation in the X. laevis in vitro system closely resembles 
that of genome duplication in early embryos. Specifically, 
we find that the time required to duplicate the genome 
in vitro agrees well with what is observed in vivo. In 
addition, the model yields accurate values for replicon 
sizes and replication fork velocities that confirm previous 
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FIG. 6: Starting-time distribution (j>(t). Solid curve is a least- 
squares fit to a Gaussian distribution. 

observations. |t1 l28l| Though replication in vitro may differ 
biologically from what occurs in vivo, the results never- 
theless demonstrate that the kinetics remains essentially 
the same. Of course, the specific finding of an increas- 
ing rate of initiation invites a biological interpretation 
involving a kind of autocatalysis, whereby the replica- 
tion process itself leads to the release of a factor whose 
concentration determines the rate of initiation. This will 
be explored in future work. 



Directions for future experiments in X. laevis 

One effect that we did not include in our analysis is a 
variable fork velocity. For example, v might decrease as 
forks coalesce or as replication factor becomes limiting 
toward the end of S phase. [1 12I I2HI2H Such effects, if 
present, are too small to see in the data analyzed here. 

Another important question is to separate the effects of 
any intrinsic distribution due to early and late-replicating 
regions of the genome of a single cell from the extrinsic 
distribution caused by having many cells in the experi- 
ment. One approach would be to isolate and comb the 
DNA from a single cell. Although difficult, such an exper- 
iment is technically feasible. The latter problem could be 
resolved by in situ fluorescence observations of the chosen 
cell. 



Applications to other systems 

One can entertain many further applications of the ba- 
sic model discussed above, which can be generalized, if 
need be. For example, Blumenthal et al. interpreted 
their results on replication in Drosophila melanogaster 
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for Pi2i(i, f) to imply periodically spaced origins in the 
genome. |2l| (See their Fig. 7.) It is difficult to judge 
whether their peaks are real or statistical happenstance, 
but if the conclusion is indeed that the origins in that sys- 
tem are arranged periodically, the kinetics model could 
be generalized in a straightforward way (introducing an 
I(x,t) that was periodic in x). 

Very recently, detailed data on the replication of 
budding yea st (Saccharomyces cerevisiae) have become 
available. 33J The data provide information on the loca- 
tions of origins and the timings of their initiation during 
S-phase. These data support the view of origin initia- 
tion throughout S-phase. Unlike replication in Xenopus 
prior to the mid-blastula transition, origins in budding 
yeast are associated with highly conserved sequence el- 
ements (autonomous replication sequence elements, or 
ARSs). Raghuraman et al. 33J also give the first esti- 
mates of the distribution of fork velocities during repli- 
cation. Although broad, the distribution is apparently 
stationary, and there is no correlation between velocities 
and the time in S-phase when the forks are initiated. The 
model developed here could be generalized in a straight- 
forward way to the case of budding yeast. Knowing the 
sequence of the genome and hence the location of po- 
tential origins means that the initiation function would 
be an explicit function of position x along the genome, 
with peaks of varying heights at each potential origin. 
The advantage of the kind of modeling advanced here 
would be the opportunity to derive quantities such as 
the replication fraction as a function of time in S'-phase. 
Raghuraman et al. fit their data for this "timing curve" 
to an arbitrarily chosen sigmoidal function. (See their 
supplementary data, Section II-5.) Such modeling will 
make it easier to find meaningful biological explanations 
of the programming of S'-phase evolution. 



The origin-spacing problem 

One outstanding issue in DNA replication in eukary- 
otes is the observation that the replication origins cannot 
be too far apart, as this would prevent the genome from 
being replicated completely within the length of a single 
S-phase. [34[ One solution that has been proposed is that 
there is an excess of pre-replication complexes (pre-RCs) 
of highly conserved proteins, which assemble at ORC- 
bound DNA sites before the cell enters S-phase (e.g., 
Lucas et a2.[l4|, and references therein). In this case, 
the position of each potential origin of replication (POR) 
can be distributed randomly, with a statistically insignifi- 
cant probability of having large gaps between PORs. The 
problem with this solution is that the average POR spac- 
ing must be much smaller (less than 1-2 kb) than the 
reported values of XORC spacing of 7-16 kb. [a. l35j 

A second proposed solution to the origin-spacing prob- 
lem is to invoke correlations in POR spacings. In other 
words, instead of assuming a purely random pre-RC dis- 
tribution, one imposes constraints that force a partial pe- 



riodicity on the POR spacing, so that most of the origins 
are spaced 5-15 kb apart (Blow et a/.,[36j and references 
therein) . This suppresses the formation of large gaps but 
raises other issues. First, it requires an unknown mecha- 
nism to achieve this periodicity of POR spacing. Second, 
it assumes implicitly that most of the PORs fire during 
S-phase, to prevent the 30 kb gap that could arise from 
a origins failure to initiate, which is not obvious at all. 
Third, if origins initiate throughout S-phase, then there 
needs to be some kind of correlation that forces the more 
widely spaced origin groups to initiate early enough in 
S-phase to complete replication in the required time. 

Implicitly, our model adopts language consistent with 
the first solution, but it is straightforward to consider 
the correlations assumed in the second solution. The 
presence of significant correlations in PORs would not 
invalidate the results presented here, which pertain to 
mean quantities (e.g., Fig. however, it would change 
their interpretation and could change biological models 
that one might try to make to explain the observed ki- 
netic parameters we extract using the KJMA model. We 
plan to investigate these questions, along with the effect 
of origin efficiency on DNA replication kinetics, in future 
work. 



Conclusion 

In this article, we have introduced a class of theoret- 
ical models for describing replication kinetics that is in- 
spired by well-known models of crystal-growth kinetics. 
The model allows us to extract the rate of initiation of 
new origins, a quantity whose time dependence has not 
previously been measured. With remarkably few param- 
eters, the model fits quantitatively the most detailed ex- 
isting experiment on replication in Xenopus. It repro- 
duces known results (for example, the fork velocity) and 
provides the first reliable description of the temporal or- 
ganization of replication initiation in a higher eukaryote. 
Perhaps most important, the model can be generalized in 
a straightforward way to describe replication and extract 
relevant parameters in essentially any organism. 

Methods 

Mathematical analogy between crystal growth and 
the kinetics of DNA replication 

In this section, we describe how certain features of the 
mathematics describing crystal growth may be mapped 
onto a model describing the kinetics of DNA replication. 
We emphasize that the analogy is a formal one - the un- 
derlying processes are completely different. However, by 
mapping our problem onto one that has been long stud- 
ied in a different context, we can take over a number of 
results that have already been derived, and we can de- 
velop useful intuitions about how to look at experimental 
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results about DNA replication. 

In the 1930s, several scientists independently derived 
a stochastic model that described the kinetics of crys- 
tal growth. [HHl HI The "Kolmogorov-Johnson-Mehl- 
Avrami" (KJMA) model has since been widely used by 
metallurgists and other scientists to analyze thermody- 
namic phase transformations. |40j 

In the KJMA model, freezing kinetics result from three 
simultaneous processes: 

1) nuclcation, which leads to discrete solid domains. 

2) growth of the domain. 

3) coalescence, which occurs when two expanding do- 
mains merge. Each of these processes has an analog in 

DNA replication in higher eukaryotes, and more specif- 
ically embryos: 1) The activation of an origin of repli- 
cation is analogous to the nucleation of the solid do- 
mains during crystal growth. 2) Symmetric bidirectional 

DNA synthesis initiated (nucleated) at the origin corre- 
sponds to solid-domain growth. 3) Coalescence in crystal 

growth is analogous to multiple dispersed sites of repli- 
cating DNA (replication fork) that advance from opposite 
directions until they merge. 



Simple version of the KJMA model for DNA 
replication 

In the simplest form of the KJMA model, solids nu- 
cleate anywhere in the liquid, with equal probability 
for all spatial locations ("homogeneous nucleation"), al- 
though it is straightforward to describe nucleation at pre- 
specified sites ("heterogeneous nucleation"), which would 
correspond to a case where replication origins are speci- 
fied by fixed genetic sites along the genome. Once a solid 
domain has been nucleated, it grows out as a sphere at 
constant velocity v. When two solid domains impinge, 
growth ceases at the point of contact, while continuing 
elsewhere. KJMA used elementary methods to calculate 
quantities such as /(t), the fraction of the volume that 
has crystallized by time (r). Much later, more sophis- 
ticated methods were developed to describe the detailed 
statistics of domain sizes and spacings. |4lt |42| 

DNA replication, of course, corresponds to one- 
dimensional crystal growth; the shape in three dimen- 
sions of the one-dimensional DNA strand does not di- 
rectly affect the kinetics modeling. (In the model, repli- 
cation is one dimensional along the DNA. The configu- 
ration of DNA in three dimensions is not directly rele- 
vant to the model but can enter indirectly via the nucle- 
ation function I(x, r). For example, if, for steric reasons, 
certain regions of the DNA are inaccessible to replica- 
tion factories, those regions would have a lower (or even 
zero) value of /.) The one-dimensional version of the 



KJMA model assumes that domains grow out at veloc- 
ity v, assumed to remain constant. The nuclcation rate 
I(x,t) — Iq is defined to be the probability of domain 
formation per unit length of unreplicated DNA per unit 
time, at the position x and time r. Following the analogy 
to the one-dimensional KJMA model, we can calculate 
the kinetics of DNA replication during S-phase. This re- 
quires determining the fraction of the genome /(r) that 
has already been replicated at any given moment during 
5-phase. One finds 

/( T ) = l- e -W (1) 

which defines a sigmoidal curve. (Eq. ^ assumes an infi- 
nite genome length. The relative importance of the finite 
size of chromosomes is set by the ratio (fork velocity * du- 
ration of S*-phase) / chromosome length (Cahn, 1996). In 
the case of the experiment analyzed in this paper, this ra- 
tio is w 10 bases/sec * 1000 sec / 10 7 bases/chromosome 
w 10 ~ 3 , which we neglect.) 

A more complete description of replication kinetics 
requires detailed analysis of different statistical quanti- 
ties, including measurements made on replicated regions 
(eyes), unreplicated regions (holes), and eye-to-eye sizes 
(the eye-to-eye size is defined as the length between the 
center of one eye and the center of a neighboring eye.) 
The probability distributions may be expressed as func- 
tions either of time r or replicated fraction /. For exam- 
ple, the distribution of holes of size I at time r, ph(i,T) 
can be derived by a simple extension of the argument 
leading to Eq. ^ 

p h (e,T) = I T-e- I ° ri . (2) 
From Eq. the mean size of holes at time r is 



Determining the probability distributions of replicated 
lengths (eye sizes) is complicated because a given repli- 
cated length may come from a single origin or it may 
result from the merger of two or more replicated regions. 
Thus, one must calculate in effect an infinite number of 
probabilities; by contrast, holes of a given length arise 
in only one way. |42j One can nonetheless derive a sim- 
ple expression for £i(r), the mean replicated length at 
time r, from a "mean-field" hypothesis 01 : the proba- 
bility distribution of a given replicated length is assumed 
to be independent of the actual size of its neighbor. One 
can show that this mean-field hypothesis must always be 
true in one-dimensional growth problems, but not neces- 
sarily in the ordinary three-dimensional setting of crystal 
growth. In particular, if J(r) depends on space, one ex- 
pects correlations to be important. Using the mean-field 
hypothesis, we find 

e i (r)=£ h (r)-^— = — (4) 

1 - / Iqt 
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and 



iai(j) = ii(r) + t h (r) 



4(r) 
1-/ 



J vr z 



lor 



(5) 



The minimum average eye-to-eye size, obtained by differ- 
entiating Eq. is l* 2i — \/2e ■ ^/v/Iq. These expressions 
for li(r) and taiij) allow one to collapse the experimen- 
tal observations of £h, 4, and tin (the mean eye-to-eye 
separation) onto a single curve. (See Fig. below.) 

Finally, we can calculate the average distance between 
origins of replication that were activated at different 
times during the replication process, which is just the 
inverse of Itot , the time- integrated nucleation probability 
per unit length: 



IZ 



(6) 



The last expression shows that, as might have been 
guessed by dimensional analysis of the model parame- 
ters (Io and v), the basic length scale in the model is 
set by £* = wvjlo. Note that because initiation in the 
model is occurring throughout S'-phase, the minimum 
eye-to-eye distance (i2i_min is not the same as the aver- 
age separation betwe en origins, £q. For this simple case, 
iai-min/f-a = V en/2 ss 2.1. 



Generalizations of the KJMA model 

Based on the specific results of the Xenopus experi- 
ments discussed above, we generalized the simple version 
of the KJMA model in several ways: The first general- 
ization is to allow for arbitrary I(t). Eq. ^then becomes 

f(r) = 1 - e _9(T) with g(r) = 2v f I(t')(t - t') dr' , 

Jo 

(7) 

and, similarly, Eq. becomes 



4(r) 



I(r')dT' 



Uo 



(8) 



The other mean lengths, liir) and lvn(j), continue to be 
related to £h(j~) by the general expressions given in Eqs. 
0] and El In the experiment, one measures £h, and tin 
as functions of both t and /. (Because of the start-time 
ambiguity, the / data are easier to interpret.) The goal 
is to invert this data to find I(t). Using Eqs. 0and|Hl 
we find 



r(f) = 



1 

2^ 



f 1 

W/)4f' = 



2v 



f W) 



1-f 



L ,df. (9) 



Because t(/) increases monotonically, one can numeri- 
cally invert it to find /(r). From /(r), one can derive all 
quantities of interest, including I(t). 

The starting time distribution <j>{t) can be deduced 
looking at each molecular fragment, measuring its repli- 
cation fraction /, and extrapolating back to a starting 



time using the experimentally determined /(r) curve. 
(Fragments that are fully replicated (/ = 1 are excluded.) 
The starting times are then binned to give cj>(t) directly. 



Monte-Carlo simulations 



We wrote a Monte-Carlo simulation using the pro- 
gramming language of Igor Pro (Wavemetrics) to test 
various experimental effects that were difficult to model 
analytically. These included the effects of finite sampling 
of DNA fragments (on average, 190 molecules per time 
point), the finite optical resolution of the scanned im- 
ages, and - most important - the effect of the finite size 
of the combed DNA fragments. The size of each molecu- 
lar fragment in the simulation was drawn randomly from 
an estimate of the actual size distribution of the exper- 
imental data. This distribution was approximately log- 
normal, with an average length of 102 kb. and a standard 
deviation of 75 kb. 

In the simulations, we consider each DNA molecular 
fragment as a one-dimensional lattice, and each lattice 
site is updated with a timestep At = 0.2 min. An origin 
is initiated (lattice site changed from to 1) with a prob- 
ability determined by the initiation rate Once an 
origin has been initiated, replication forks grow bidirec- 
tionally at a constant rate v. The natural size of lattice 
then would be v At, which is 123 bp for the measured fork 
velocity v — 615 bp/min and chosen time step At. The 
lattice scale is then roughly the size of origin recognition 
complex proteins. We sample the simulation results at 
the same time points as the actual experiments [t = 25, 
29, 32, 35, 39, 45 min.) Each sampled molecule is cut at 
random site to simulate the combing process. The lattice 
is then "coarse grained" by averaging over approximately 
four pixels. The coarse lattice length scale is then 0.24 
/mi, which roughly corresponds to that of the scanned op- 
tical images. Finally, the coarse-grained fragments were 
analyzed to compile statistics concerning replicon sizes, 
eye-to-eye sizes, etc. that were directly compared to ex- 
perimental data. 

In a first version of the simulation, the lattice was di- 
rectly simulated using a vector with one element for each 
lattice site. In a more refined version of the simulation, 
we noted only the position of the replication forks, which 
greatly increased the speed of the simulations. 

We also used the simulation to test a previous al- 
gorithm for extracting /(/), the initiation rate as a 
function of overall replication fraction. The previous 
algorithm[ll|4| looked for small replicated regions and 
extrapolated back to an assumed initiation time. We 
tested this algorithm using our Monte-Carlo analysis and 
found significant bias in the inferred /(/), while the al- 
gorithms we introduce here showed no such bias. 
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Parameter extraction from data 

We extracted data from both the real experiments and 
the Monte-Carlo simulations by a global least-squares fit 
that took into account simultaneously the different data 
collected (i.e., the different curves in Figs. Eland |3J|. As 
discussed above, we fit a two-segment straight line to the 
I(t) curve extracted directly from the data for analytic 
simplicity. Assuming this form for I(t), we derive explicit 
formulae for the curves in Figs. |21 and 0] 

The finite size of the molecular fragments studied 
(102 ± 75 kb) causes systematic deviation from the 
"infinite-length" formulae we can calculate. Such devi- 
ations could be detected using the Monte-Carlo simu- 
lations by comparing the extracted values of parameters 
with those input. The deviations show themselves mainly 
in two settings: First, whenever the mean length of holes, 
eyes, or eye-to-eye distances approaches the mean seg- 
ment length, the observed mean lengths will be systemat- 
ically too small because the larger end of the experimen- 
tal distributions is cut off by the finite fragment length. 
We dealt with this complication by restricting our fit to 



areas where the mean length being measured is less than 
10% of the mean fragment size. The second complica- 
tion is that the inferred fork velocity is systematically 
reduced (by about 5% for the fragment size in the exper- 
iments analyzed here). We measured this bias using the 
Monte-Carlo simulations and then corrected the "raw" 
fork velocity that is given by our least-squares fits. 

One further subtle point in a global fit is the rela- 
tive weighting to be given to the data in the p(/) curves 
(Fig. |3J) relative to the data in the mean-value curves 
(Fig. EJ . We estimated the weights using the boot-strap 
method. In a similar spirit, we used repeated Monte- 
Carlo simulations to estimate statistical errors in exper- 
imentally extracted quantities. 
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